Skip to content

NEWDATA(CAVM-2-0): Add var landCoverCat with attrs flag_values, flag_meanings, flag_descriptions, flag_colors; v20260331.nc#71

Merged
msteckle merged 11 commits into
mainfrom
cavm
Apr 9, 2026
Merged

NEWDATA(CAVM-2-0): Add var landCoverCat with attrs flag_values, flag_meanings, flag_descriptions, flag_colors; v20260331.nc#71
msteckle merged 11 commits into
mainfrom
cavm

Conversation

@msteckle

@msteckle msteckle commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator

This is a gridded land cover classification product, which is new to ILAMB, so how we format it may change over time.

The landCoverCat data are stored as U-ints, but landCoverCat has the attributes flag_values and flag_meanings, which are the CF-standard way for mapping a numeric to a string. However, we add our own additional attribute, flag_descriptions, because CAVM has an integer, string code, and string description for each landCoverCat.

I went a little crazy and also added flag_colors, which are string hex codes to use for mapping. Mapping CAVM without their color palette is a nightmare, so better to have it handy.

landCoverCat
netcdf ILAMB_UAF_CAVM-2-0_fx_landCoverCat_gr_v20260331 {
dimensions:
        lon = 30556 ;
        lat = 4271 ;
        bnds = 2 ;
variables:
        double lon(lon) ;
                lon:axis = "X" ;
                lon:units = "degrees_east" ;
                lon:standard_name = "longitude" ;
                lon:long_name = "Longitude" ;
                lon:bounds = "lon_bnds" ;
        double lat(lat) ;
                lat:axis = "Y" ;
                lat:units = "degrees_north" ;
                lat:standard_name = "latitude" ;
                lat:long_name = "Latitude" ;
                lat:bounds = "lat_bnds" ;
        byte landCoverCat(lat, lon) ;
                landCoverCat:_FillValue = -127b ;
                landCoverCat:units = "" ;
                landCoverCat:standard_name = "cover_category" ;
                landCoverCat:long_name = "Vegetation or Land-Cover Category" ;
                landCoverCat:flag_values = 1b, 2b, 3b, 4b, 5b, 21b, 22b, 23b, 24b, 31b, 32b, 33b, 34b, 41b, 42b, 43b, 91b, 92b, 93b, 99b ;
                landCoverCat:flag_meanings = "B1 B2a B3 B4 B2b G1 G2 G3 G4 P1 P2 S1 S2 W1 W2 W3 FW SW GL NA" ;
                landCoverCat:flag_descriptions = "cryptogam_herb_barren cryptogam_barren_complex non-carbonate_mountain_complex carbonate_mountain_complex cryptogam_barren_dwarf-shrub_complex graminoid_forb_cryptogam_tundra graminoid_prostrate_dwarf-shrub_forb_moss_tundra non-tussock_sedge_dwarf-shrub_moss_tundra tussock-sedge_dwarf-shrub_moss_tundra prostrate_dwarf-shrub_herb_lichen_tundra prostrate-hemi-prostrate_dwarf-shrub_lichen_tundra erect_dwarf-shrub_moss_tundra low-shrub_moss_tundra sedge-grass_moss_wetland_complex sedge_moss_dwarf-shrub_wetland_complex sedge_moss_low-shrub_wetland_complex fresh_water salt_water glacier non-arctic" ;
                landCoverCat:flag_colors = "#d7d7b3 #a8a802 #a68282 #8282a0 #cdcd66 #ffebaf #ffd37f #e6e600 #ffff00 #dfb0b0 #db949e #97e602 #38a802 #9eedbd #73ffdf #04e6a9 #0070ff #e0f2ff #ffffff #cccccc" ;
        double lat_bnds(lat, bnds) ;
        double lon_bnds(lon, bnds) ;

// global attributes:
                :Conventions = "CF-1.12 ODS-2.6" ;
                :activity_id = "ILAMB" ;
                :aux_uncertainty_id = "N/A" ;
                :contact = "Martha Raynolds (mkraynolds@alaska.edu)" ;
                :creation_date = "20260331" ;
                :data_specs_version = "2.6" ;
                :dataset_contributor = "Morgan Steckler" ;
                :doi = "https://doi.org/10.17632/c4xj5rv6kv.2" ;
                :frequency = "fx" ;
                :grid = "1x1 km Lambert Azimuthal Equal Area reprojected to latitude x longitude" ;
                :grid_label = "gr" ;
                :has_aux_unc = "FALSE" ;
                :history = "\n20260331: \"CMORized\" data from Raster CAVM 2.0 (downloaded from Mendeley Data)\n\n20260331: Reprojected to lat/lon and added metadata using ILAMB utilities\n" ;
                :institution = "University of Alaska, Fairbanks, USA" ;
                :institution_id = "UAF" ;
                :license = "https://creativecommons.org/licenses/by-nc/3.0/deed.en" ;
                :nominal_resolution = "1 km" ;
                :processing_code_location = "https://github.com/rubisco-sfa/ilamb3-data/tree/main/data/CAVM-2-0/convert.py" ;
                :product = "derived" ;
                :realm = "land" ;
                :references = "Raynolds, M.K., et al. (2019). A raster version of the Circumpolar Arctic Vegetation Map (CAVM). Remote Sensing of Environment, 232, 111297. https://doi.org/10.1016/j.rse.2019.111297" ;
                :region = "panarctic" ;
                :site_id = "N/A" ;
                :site_location = "N/A" ;
                :source = "Unsupervised classifications of seventeen geographic/floristic sub-sections of the Arctic, using AVHRR and MODIS data (reflectance and NDVI) and elevation data" ;
                :source_data_retrieval_date = "2026-03-30T12:06:45Z" ;
                :source_data_url = "https://data.mendeley.com/public-files/datasets/c4xj5rv6kv/files/5223c414-234a-498c-ae08-3100cb38510f/file_downloaded" ;
                :source_id = "CAVM-2-0" ;
                :source_label = "CAVM" ;
                :source_type = "satellite_retrieval" ;
                :source_version_number = "2.0" ;
                :table_id = "N/A" ;
                :title = "Raster Circumpolar Arctic Vegetation Map" ;
                :tracking_id = "hdl:21.14102/60f0d31b-3ce2-4d24-875a-e89614a04723" ;
                :variable_id = "landCoverCat" ;
                :variant_info = "CMORized product prepared by ILAMB" ;
                :variant_label = "ILAMB" ;
                :version = "v20260331" ;
}

…add opt to wipe all previous attrs/encoding; create doc string for set_var_attrs; add some validation to set_var_attrs; add flag_values, flag_meanings, extra_attrs params
…nt to map; mapping CAVM is a pain without it
@msteckle

msteckle commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator Author

I really need to update the dataset validator... It's pretty useless. The failed CI check here doesn't matter.

@msteckle msteckle requested review from Copilot and nocollier April 1, 2026 00:38
@msteckle msteckle self-assigned this Apr 1, 2026
@msteckle msteckle added the new dataset New script to process a dataset for ILAMB3 label Apr 1, 2026
@msteckle msteckle moved this from Backlog to In review in ILAMB Development Collective Apr 1, 2026
@msteckle msteckle linked an issue Apr 1, 2026 that may be closed by this pull request

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new gridded land-cover categorical dataset (CAVM-2-0 landCoverCat) to the ILAMB data registry and introduces utility enhancements to better support categorical/flag metadata and compressed NetCDF output.

Changes:

  • Register a new NetCDF artifact for CAVM-2-0 landCoverCat in registry/data.txt.
  • Add a new conversion script to download, reproject, and write the landCoverCat product with CF flag metadata (plus extra palette/description attrs).
  • Extend ilamb3_data utilities: HTML downloads now optionally unzip, coordinates/variables support compression encoding, and set_var_attrs adds validation + flag handling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
registry/data.txt Registers the new CAVM-2-0 landCoverCat NetCDF file and checksum.
ilamb3_data/init.py Enhances download and attribute-setting utilities (zip extraction, compression, flag metadata, validations).
data/CAVM-2-0/convert.py New pipeline to fetch/extract CAVM, reproject to lat/lon, set CF/ODS attrs, and write the output NetCDF (with plotting verification).
Comments suppressed due to low confidence (1)

ilamb3_data/init.py:736

  • _FILL_VALUES attempts to support unicode strings with np.dtype('U'), but np.dtype('U') (length 0) won’t match typical dtypes like '<U32', and the kind-based fallback doesn’t handle final_dt.kind == 'U' (or object strings). This can lead to ValueError: No CF _FillValue defined... for valid string variables. Consider adding a final_dt.kind == 'U' fallback (and/or handling object/variable-length strings explicitly).
# default CF _FillValue options (see https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#classic_format_spec)
_FILL_VALUES = {
    np.dtype("S1"): np.bytes_(b"\x00"),  # char (fixed-length)
    np.dtype("U"): "",  # string (variable-length)
    np.int8: np.int8(-127),  # byte
    np.int16: np.int16(-32767),  # short
    np.int32: np.int32(2147483647),  # int
    np.float32: np.float32(1.0e20),  # float
    np.float64: np.float64(1.0e20),  # double
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ilamb3_data/__init__.py
Comment thread ilamb3_data/__init__.py
Comment thread ilamb3_data/__init__.py
Comment thread ilamb3_data/__init__.py Outdated
Comment thread ilamb3_data/__init__.py Outdated
Comment thread ilamb3_data/__init__.py
Comment thread data/CAVM-2-0/convert.py Outdated
Comment thread data/CAVM-2-0/convert.py
Comment thread data/CAVM-2-0/convert.py
Comment thread data/CAVM-2-0/convert.py Outdated
@msteckle msteckle changed the title NEWDATA(CAVM-2-0): Add var landCoverCat with attrs flag_values, flag_meanings, flag_descriptions; v20260331.nc NEWDATA(CAVM-2-0): Add var landCoverCat with attrs flag_values, flag_meanings, flag_descriptions, flag_colors; v20260331.nc Apr 1, 2026

@nocollier nocollier left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings about this, but I wonder why we don't do something more like this:

Dimensions:            (lat: 4271, lon: 30556, bnds: 2, flag_values: 20)
Coordinates:
  * lat                (lat) float64 34kB 89.99 89.98 89.97 ... 39.71 39.7 39.69
  * lon                (lon) float64 244kB -180.0 -180.0 -180.0 ... 180.0 180.0
  * flag_values        (flag_values) int8 20B 1 2 3 4 5 21 ... 42 43 91 92 93 99
    flag_meanings      (flag_values) <U3 240B 'B1' 'B2a' 'B3' ... 'SW' 'GL' 'NA'
    flag_descriptions  (flag_values) <U50 4kB 'cryptogam_herb_barren' ... 'no...
Dimensions without coordinates: bnds
Data variables:
    landCoverCat       (lat, lon) float32 522MB 92.0 92.0 92.0 ... nan nan nan
    lat_bnds           (lat, bnds) float64 68kB 89.99 90.0 89.98 ... 39.68 39.69
    lon_bnds           (lon, bnds) float64 489kB -180.0 -180.0 ... 180.0 180.0
Attributes: (12/38)
    Conventions:                 CF-1.12 ODS-2.6
    activity_id:                 ILAMB
...

Your way may be most like the standard and if you are more comfortable keeping it that way, then that is fine with me. But it irks me that I cannot use the flag meanings/descriptions without post-processing them. That is, I have to read it and split by spaces for them to be useful at all. This seems silly when we can just store the arrays? We can chat about this.

@msteckle

msteckle commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator Author

I don't have strong feelings about this, but I wonder why we don't do something more like this:

Dimensions:            (lat: 4271, lon: 30556, bnds: 2, flag_values: 20)
Coordinates:
  * lat                (lat) float64 34kB 89.99 89.98 89.97 ... 39.71 39.7 39.69
  * lon                (lon) float64 244kB -180.0 -180.0 -180.0 ... 180.0 180.0
  * flag_values        (flag_values) int8 20B 1 2 3 4 5 21 ... 42 43 91 92 93 99
    flag_meanings      (flag_values) <U3 240B 'B1' 'B2a' 'B3' ... 'SW' 'GL' 'NA'
    flag_descriptions  (flag_values) <U50 4kB 'cryptogam_herb_barren' ... 'no...
Dimensions without coordinates: bnds
Data variables:
    landCoverCat       (lat, lon) float32 522MB 92.0 92.0 92.0 ... nan nan nan
    lat_bnds           (lat, bnds) float64 68kB 89.99 90.0 89.98 ... 39.68 39.69
    lon_bnds           (lon, bnds) float64 489kB -180.0 -180.0 ... 180.0 180.0
Attributes: (12/38)
    Conventions:                 CF-1.12 ODS-2.6
    activity_id:                 ILAMB
...

Your way may be most like the standard and if you are more comfortable keeping it that way, then that is fine with me. But it irks me that I cannot use the flag meanings/descriptions without post-processing them. That is, I have to read it and split by spaces for them to be useful at all. This seems silly when we can just store the arrays? We can chat about this.

Decided to stick with conventions as closely as possible. So, flag_ will need to be parsed!

@msteckle msteckle merged commit 0357606 into main Apr 9, 2026
1 check failed
@github-project-automation github-project-automation Bot moved this from In review to Done in ILAMB Development Collective Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new dataset New script to process a dataset for ILAMB3

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

New Dataset: Circumpolar Arctic Vegetation Map (CAVM)

3 participants